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ABSTRACT 

In  this  paper,  we  present  an  EPGA  implementation  for  synchronization  of  SOQPSK-TG  in  burst-mode 
transmissions.  The  system  first  detects  arrival  of  new  bursts,  after  which  it  estimates  carrier  frequency, 
carrier  phase,  and  symbol  timing  offsets.  Additionally,  it  is  designed  based  on  the  synchronization  algo¬ 
rithms  developed  for  the  iNET  preamble.  Here,  we  introduce  some  complexity  reduction  techniques  in 
order  to  save  chip  area  and  to  minimize  latency.  The  implementation  results  are  shown  to  be  very  close  to 
the  computer  simulations  in  terms  of  estimation  error  variances  and  the  overall  bit-error  rate  (BER). 

INTRODUCTION 

The  migration  toward  the  integrated  network  enhanced  telemetry  (iNET)  system  introduces  a  funda¬ 
mental  physical  layer  challenge:  how  to  acquire  and  lock  onto — i.e.,  synchronize  with — ^brief  signal  bursts 
at  a  low  signal-to-noise  ratio  (SNR).  In  this  paper,  we  begin  by  formulating  this  important  synchroniza¬ 
tion  problem.  We  then  show  how  detect  the  so-called  start-of-signal  (SoS)  condition  and  how  to  roughly 
estimate  its  exact  beginning.  Once  the  location  of  the  SoS  is  roughly  known,  we  then  formulate  the  remain¬ 
ing  task  as  a  three-way  joint  problem  of  estimating  carrier  frequency,  carrier  phase,  and  symbol  timing. 
We  provide  synchronization  algorithms  that  solve  this  problem,  which  are  based  on  the  data  preamble 
for  the  iNET  system.  We  take  care  in  formulating  these  algorithms  for  use  in  EPGA  implementations. 
We  conclude  by  presenting  a  performance  characterization  of  hardware  prototypes  of  the  synchronization 
algorithms. 


SOQPSK-TG  SIGNAL  MODEL 

In  our  model,  we  assume  each  burst  starts  with  a  preamble,  which  is  comprised  of  Lq  symbols  having 
a  duration  of  Tq  =  LqTs  seconds  where  T*  is  the  symbol  duration.  It  is  immediately  followed  by  the 
payload  carrying  the  information  symbols.  The  complex  baseband  SOQPSK  signal  during  transmission 
of  the  preamble  can  be  expressed  as 


s{t) 


exp{j0(f;Q:)} 


(1) 


1 


(2) 


where  is  Eg  is  energy  per  transmitted  symbol.  The  phase  of  the  signal  cc)  is  defined  as 

Lq  — 1 

(j)(t]  a)  =  2nh  otiqit  —  iTg) 

i=0 

where  a*  is  the  transmitted  ternary  symbol,  i.e.  ttj  G  {  —  1,0,1},  and  h  =  1/2  is  the  modulation  index. 
The  waveform  q{t)  is  the  phase  response  of  SOQPSK  and  in  general  is  represented  as  the  integral  of  the 
frequency  pulse  g{t)  with  a  duration  of  LTg.  There  are  currently  two  different  versions  of  SOQPSK  defined 
by  their  own  frequency  pulses.  The  first  one  known  as  the  SOQPSK-MIL  [1]  is  a  full-response  (L  =  1) 
scheme  with  a  rectangular-shaped  frequency  pulse.  The  second  form  is  the  telemetry  group  version  [2], 
i.e.  SOQPSK-TG,  which  is  partial-response  (L  =  8)  with  a  custom  frequency  pulse.  According  to  the 
CPM  definition,  q{t)  is  zero  for  t  <  0  and  is  1/2  for  t  >  LTg. 

The  SOQPSK  modulator  can  be  characterized  as  a  precoder  connected  to  a  CPM  modulator.  The 
precoder  converts  information  bits  a*  G  (0, 1}  to  ternary  symbols  by  means  of 

=  (-iy+'(2ai_i-l)(a,-a,_2)  (3) 


in  order  to  impose  OQPSK-like  characteristics  on  the  CPM  signal.  In  the  following,  we  will  perform  our 
analysis  based  on  {at}  because  the  preamble  is  fixed  and  known  in  terms  of  the  ternary  symbols.  Note 
that  the  precoder  does  not  change  the  rate  of  input  bits,  and  hence,  Tb  =  Tg. 

Assuming  transmission  over  an  AWGN  channel,  the  complex  baseband  representation  of  the  received 
signal  is 


r(t) 


V  T 

V  s 


+  w(t) 


(4) 


where  9  is  the  unknown  carrier  phase,  fd  is  the  frequency  offset,  r  is  the  timing  offset,  and  w(t)  is  complex 
baseband  AWGN  with  zero  mean  and  power  spectral  density  Nq.  The  transmitted  data  symbols  are  denoted 
by  CK  =  [ao,  cti,  •  ■  ■  ,  ulq-i]-  Our  known  preamble  is  implicit  in  the  definition  of  s{f).  Prior  to  signal 
detection,  we  need  to  estimate  the  synchronization  parameters.  The  first  step  in  synchronization  is  called 
frame  synchronization  in  which  the  location  of  the  start-of-signal  (SoS)  is  determined  within  one  symbol 
duration.  Next,  we  jointly  estimate  frequency  offset,  phase  offset  and  symbol  timing.  Thus,  we  assume 
—Ts/2  <  T  <  Tg/2  when  we  are  dealing  with  symbol  timing  estimation. 

The  proposed  preamble  for  iNET  has  a  length  of  Lq  =  128.  This  preamble  is  periodic  and  it  consists 
of  repeating  a  sequence  of  16  ternary  symbols  8  times  as  follows. 


ttfc 

ttfc+8 


1,  1,  1,  1,  1,  1,  1,  0 

-1,  -1,  -1,  -1,  -1,  -1,  -1,  0 


0,...,7. 


(5) 


SYNCHRONIZATION  ALGORITHM 


As  mentioned  earlier,  the  first  stage  is  frame  synchronization.  The  frame  synchronization  algorithm  in 
this  work  is  based  on  the  data-aided  synchronization  algorithm  of  [3],  which  is  for  general  CPM  signals. 
We  only  need  to  replace  h,  q{t)  and  the  data  symbols  with  their  SOQPSK-TG  counterparts. 

The  first  step  in  frame  synchronization  is  called  the  SoS  detection,  where  the  algorithm  decides  on  the 
presence  of  the  preamble  in  the  received  signal  using  the  following  test: 
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Figure  1 :  The  frame  synehronization  data  path. 


where  Np  =  128N  is  the  number  of  samples  in  the  preamble  when  r(t)  is  sampled  at  N  samples  per 
symbol.  Additionally,  s[n],  for  0  <  n  <  128 —  1,  are  the  known  samples  of  our  preamble.  In  the  above, 
1  <  D'  <  Ap  is  a  design  parameter  to  eontrol  the  eomplexity.  The  above  test  is  performed  on  a  sliding 
window  over  the  reeeived  samples,  and  the  SoS  is  deteeted  when  it  becomes  greater  than  the  test  threshold 
'jD-  The  second  step  in  frame  synchronization  is  called  the  SoS  estimation,  where  the  exact  location  of 
the  SoS  is  estimated  by  finding  the  maximum  of  (6)  over  a  window  of  N^,  samples  after  it  crosses  the 
threshold. 

A  high  level  block  diagram  of  (6)  is  depicted  in  Figure  1.  Based  on  our  MATLAB  simulations,  we 
determined  that  D  =  A  performs  close  to  D  =  Np.  Consequently,  the  frame  synchronization  algorithm 
requires  us  to  implement  four  complex  finite  impulse  response  (FIR)  filters.  We  denote  these  filters  by 
FIRd  for  1  <  d  <  4.  The  input  to  FIRd  filter  is  r[n]r*[n  +  d],  or  equivalently  r[n]*r[n  —  d].  Additionally, 
the  coefficients  of  the  d-th  filter  are  Cf  =  s*  [f]s[f  +  d]  for  0  <  f  <  128  A  —  d.  We  are  able  to  generate  the 
left  hand  side  of  (6)  by  adding  the  absolute  value  of  the  FIR  filters’  outputs.  As  (6)  suggests,  we  have  a 
new  value  for  L^ln]  with  every  incoming  sample.  This  value  is  compared  to  7  for  the  SoS  detection. 

The  SoS  estimator  observes  L4  [n]  after  it  crosses  the  threshold  for  a  duration  equivalent  to  the  preamble 
length,  i.e.  A^,  =  Np.  This  is  the  uncertainty  window  in  which  we  expect  the  SoS.  The  SoS  estimator 
simply  returns  the  index  of  the  peak  in  the  uncertainty  window.  Finally,  the  input  samples  are  buffered  to 
accommodate  the  latency  introduced  by  the  frame  synchronization  circuit  and  the  search  window  for  the 
SoS. 

The  hardware  complexity  of  the  frame  synchronization  block  is  proportional  to  the  number  of  samples 
per  preamble,  and  is  dominated  by  the  FIR  filters.  Therefore,  the  complexity  can  be  reduced  by  decimating 
r[n].  In  our  implementation  in  Figure  1,  we  found  that  a  decimation  factor  of  2  when  A  =  2  results  in 
satisfactory  performance  at  Es/Nq  as  low  as  1  dB.  More  importantly,  this  rate  of  the  input  signal  allows 
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Frequency  Offset  (i>) 


^1 


A2 


Figure  2:  The  joint  frequeney,  timing  and  phase  estimator. 


the  FIR  filter  eoeffieients  to  be  seleeted  from  {  — 1, 1,  —  j,  j},  whieh  reduees  the  filter  multipliers  into 
multiplexers.  These  eoeffieients  are  exaet  for  the  majority  of  the  preamble  exeept  for  the  transition  points 
where  we  truneate  the  filter  eoeffieients  to  the  above  values. 

The  next  stage  in  our  synehronization  algorithm  is  earner  and  symbol  timing  reeovery  based  on  the 
joint  ML  estimator  of  [4].  A  high  level  bloek  diagram  of  this  algorithm  is  presented  in  Figure  2. 

The  first  step  in  the  joint  estimator  is  frequeney  estimation.  First,  r[n]  is  demultiplexed  at  the  time 
instants  shown  in  Figure  2  where  0  <  /c  <  8.  This  results  in  ri[n]  and  r2[n].  It  is  assumed  that  r[n] 
is  initially  eonneeted  to  ri[n].  Moreover,  r2[n]  is  fed  with  zero  samples  when  r[n\  is  eonneeted  to  ri[n] 
and  viee  versa.  The  frequeney  estimator  employs  two  fast  Fourier  transform  (FFT)  bloeks.  Prior  to  the 
eomputation  of  the  FFTs,  these  bloeks  zero  pad  r'^[n]  and  r^ln]  by  a  faetor  of  Ffj  in  order  to  improve  the 
frequeney  resolution.  Eaeh  of  the  FFT  bloeks  generates  12SNKf  samples,  whieh  eorrespond  to  diserete 
frequeneies  separated  by  of  the  sampling  frequeney.  Our  investigations  show  that  Kf  =  2  delivers 
satisfaetory  frequeney  resolution.  The  maximum  of  the  magnitude  of  the  frequeney  domain  samples  eor¬ 
respond  to  the  frequeney  offset  normalized  to  the  sampling  frequeney.  The  frequeney  offset  estimator  is 
further  improved  by  using  a  Gaussian  interpolator,  i.e., 

.  ^  .  1 _ logX(z>_i)  -  logX(z>i) 

2iFyArLologX(z>_i)  +  logX(z>i)-21ogX(z>o) 

where  Fq  represents  the  maximizing  frequeney  resulting  from  FFT  operations,  and  X(-)  is  the  output  of  the 
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Resource  Type 

Number 

Utilization  Ratio 

Slices 

5,783 

33% 

Block  RAMs 

26 

17% 

DSP48 

21 

32% 

Table  1 :  FPGA  utilization  results 


adder.  z>_i  and  z>i  denote  the  discrete  frequency  components  immediately  before  and  after  z/q  respectively. 

The  next  stage  is  the  symbol  timing  estimation  where  z>  is  used  to  remove  the  frequency  offset  from 
r[[n]  and  r^ln]  for  0  <  n  <  128iV.  Two  accumulators  are  then  employed  to  compute  Ai  and  A2  from 
r'^[n]  and  r^ln]  respectively.  Finally,  the  angle  of  A^A2  divided  by  tt  results  in  the  symbol  timing  estimate 
normalized  by  the  symbol  duration,  which  is  denoted  by  i.  Therefore,  the  symbol  timing  estimator’s  range 
is  limited  within  iT^. 

The  last  stage  is  computation  of  the  carrier  phase  offset.  This  stage  requires  Ai,  A2,  and  e  from  the 
symbol  timing  estimator.  Similarly,  the  argument  function  has  to  be  realized. 

Similar  to  the  frame  synchronization  block,  we  are  able  to  reduce  the  hardware  size  by  decimating  r[n] 
by  a  factor  of  2.  In  particular,  it  allows  us  to  perform  smaller  FFTs,  which  have  smaller  latency.  The  side 
effect  of  this  operation  is  the  reduction  in  the  frequency  estimation  range  by  2.  Additionally,  we  need  to 
utilize  a  low  pass  filter  (LPF)  before  the  decimator  in  order  to  avoid  aliasing.  In  our  work,  we  implement 
the  LPF  by  averaging  three  consequent  samples  at  the  rate  of  iV  =  2.  This  configuration  is  not  an  ideal 
LPF  and  will  cause  distortion  if  the  frequency  offset  is  high.  However,  we  assume  that  the  frequency 
offset  is  reasonably  smaller  than  the  bit  rate  in  our  application.  Finally,  it  is  noted  that  this  LPF  filter  has 
a  delay  of  Tg,  and  hence,  the  estimated  symbol  timing  can  be  directly  applied  to  the  un-decimated  signal 
for  timing  recovery  and  demodulation. 

FPGA  IMPLEMENTATION 

In  order  to  develop  an  exact  representation  of  the  hardware,  we  developed  a  bit-precise  MATLAB 
model  to  determine  the  bit-widths  in  our  design.  Eight  bits  of  precision  are  used  to  represent  the  received 
signal.  This  consists  of  one  sign  bit,  three  integer  bits  and  four  fractional  bits,  when  the  SOQPSK  signal 
has  an  amplitude  equal  to  one.  Based  on  this  assumption  and  our  MATLAB  model,  we  identified  suitable 
bit- widths  for  the  internal  signals.  The  frequency  estimate  is  represented  using  14  bits  to  cover  the  range  of 
[—0.5,  0.5)  when  it  is  normalized  to  the  symbol  rate.  Both  symbol  timing  and  phase  estimates  are  signed 
values  represented  using  8  bits  including  6  fractional  bits.  The  symbol  timing  is  normalized  to  T*  and  the 
phase  estimate  is  normalized  to  tt. 

The  proposed  burst-mode  architecture  is  written  in  VHDL  and  verified  using  Modelsim.  The  VHDL 
design  is  implemented  on  a  Xilinx  Virtex  5  llOxt  FPGA  with  a  speed  grade  of  -1.  The  implementation 
results  are  presented  in  Table  1.  It  is  seen  that  our  implementation  consumes  about  of  third  of  the  FPGA. 
A  great  amount  of  savings  in  terms  of  FPGA  area  is  due  to  the  decimation  of  the  received  signal,  which 
leaves  us  enough  room  for  future  implementation  of  the  demodulator  and  the  decoder.  Furthermore,  the 
FPGA  is  capable  of  running  at  up  to  100.878  MHz. 

We  study  the  synchronization  performance  of  our  FPGA  implementation  using  test  vectors  generated 
by  MATLAB.  First,  we  investigate  the  performance  of  the  joint  frequency,  symbol  timing,  and  phase 
estimator  by  providing  the  FPGA  with  preambles  that  have  known  frequency,  timing,  and  phase  offsets, 
in  addition  to  the  additive  white  Gaussian  noise  (AWGN).  The  estimated  values  from  the  FPGA  are  sent 
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Figure  3:  The  normalized  frequency  error  variance  of  the  joint  ML  estimator.  The  frequency  is  normalized  to  the 
symbol  rate. 

back  to  the  host  PC  and  are  eompared  with  their  original  values.  In  this  setup,  we  bypass  the  frame 
synchronization  step  in  order  to  solely  observe  the  performanee  of  the  joint  estimator.  The  error  varianees 
eorresponding  to  the  frequency,  timing,  and  phase  are  plotted  in  Figures  3,  4,  and  5  respectively.  In  this  set 
of  plots,  we  run  the  simulations  for  20,000  bursts  at  eaeh  signal-to-noise  ratio  (SNR)  value.  Moreover,  we 
provide  estimation  results  from  MATLAB  where  no  simplification  or  quantization  is  present.  The  Cramer- 
Rao  bound  (CRB)  is  also  ineluded,  whieh  represents  the  lower  bound  on  the  estimation  error  varianee.  It 
is  observed  that  the  FPGA  implementation  performs  very  elose  to  the  MATLAB  results  as  far  as  frequeney 
and  phase  estimations  are  eoneerned.  However,  we  see  some  degradation  in  the  symbol  timing  estimate. 
This  is  mainly  due  to  the  down-sampling  and  our  non-ideal  LFP,  whieh  eause  signal  distortion.  Despite 
this,  our  simulations  show  that  SOQPSK-TG’s  demodulation  is  almost  intaet  for  timing  errors  as  large  as 
T5/4,  whieh  is  quite  a  bit  larger  than  the  timing  error  standard  deviation  delivered  by  our  design. 

The  overall  performanee  of  our  system  is  tested  by  simulating  a  burst-mode  reeeiver  where  the  FPGA 
performs  the  synchronization  task  and  the  demodulation  is  carried  out  in  MATLAB  on  the  host  PC.  Eaeh 
burst  consists  of  the  128-bit  known  preamble  and  6204  random  bits  as  the  payload.  In  order  to  emulate 
a  burst-mode  seenario,  we  add  a  guard  time  of  200  bit  intervals  between  bursts  in  whieh  only  AWGN  is 
present.  However,  the  duration  of  this  guard  interval  is  unknown  to  the  receiver.  Moreover,  we  introduce 
a  random  frequeney  offset  (up  to  0.2  times  symbol  rate),  a  random  earrier  phase  offset,  and  a  random 
symbol  timing  offset.  The  generated  signal  is  sent  to  the  FPGA  and  the  deteeted  bursts  along  with  the 
estimated  values  are  sent  baek  to  the  host  PC  for  demodulation  and  bit  error  rate  (BER)  eomputation.  The 
BER  results  for  this  setup  are  plotted  in  Eigure  6.  The  results  are  also  eompared  against  two  other  see- 
narios  where  the  synehronization  is  performed  in  MATEAB  as  well  as  a  perfeet  synehronization  seenario, 
i.e.  where  only  AWGN  is  present.  It  is  observed  that  the  EPGA  synehronization  has  almost  identieal 
performance  to  the  MATEAB  results.  This  validates  our  simplifieations  and  choice  of  proper  bit-widths 
in  VHDE.  Eurthermore,  we  observe  an  SNR  loss  of  less  than  0.5  dB  in  all  regions.  This  small  SNR  loss  is 
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Figure  4:  The  normalized  timing  error  variance  of  the  joint  ML  estimator.  The  timing  error  is  normalized  to  the 
symbol  duration. 


of  great  importance  in  the  low  SNR  region  because  synchronization  does  not  become  the  limiting  factor 
for  modem  error  correction  schemes,  which  are  capable  of  correcting  error  bits  at  Es/Nq  =  1  dB. 

CONCLUSIONS 

In  this  paper,  we  have  described  an  FPGA  implementation  for  burst-mode  synchronization  of  SOQPSK- 
TG.  The  proposed  architecture  has  a  feedforward  structure  and  is  designed  based  on  the  synchronization 
preamble  of  iNET.  It  detects  the  arrival  of  a  new  burst  after  which  it  estimates  frequency  offset,  symbol 
timing,  and  carrier  phase.  Our  test  results  demonstrate  that  the  error  variances  of  the  synchronization  pa¬ 
rameters  generated  by  the  hardware  are  very  close  to  that  of  our  MATLAB  implementation.  Additionally, 
we  simulated  a  burst-mode  SOQPSK-TG  receiver  in  which  the  synchronization  is  executed  on  the  FPGA. 
The  BER  results  of  such  a  system  shows  an  SNR  loss  of  less  than  0.5  dB  compared  to  a  receiver  with 
perfect  synchronization. 
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Figure  5:  The  joint  frequency,  timing  and  phase  estimator 


Figure  6:  The  BER  performance  of  a  burst-mode  SOQPSK-TG  receiver  that  employs  our  synchronization  system. 
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